Guided Policy Search with Delayed Senor Measurements

نویسندگان

  • Connor Schenck
  • Dieter Fox
چکیده

Guided policy search [1] is a method for reinforcement learning that trains a general policy for accomplishing a given task by guiding the learning of the policy with multiple guiding distributions. Guided policy search relies on learning an underlying dynamical model of the environment and then, at each iteration of the algorithm, using that model to gradually improve the policy. This model, though, often makes the assumption that the environment dynamics are markovian, e.g., depend only on the current state and control signal. In this paper we apply guided policy search to a problem with non-markovian dynamics. Specifically, we apply it to the problem of pouring a precise amount of liquid from a cup into a bowl, where many of the sensor measurements experience non-trivial amounts of delay. We show that, with relatively simple state augmentation, guided policy search can be extended to non-markovian dynamical systems, where the non-markovianess is caused by delayed sensor readings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Guided Policy Search as Approximate Mirror Descent

Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a “teacher” algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy...

متن کامل

Guided Policy Search via Approximate Mirror Descent

Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a “teacher” algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy...

متن کامل

A new memetic algorithm for mitigating tandem automated guided vehicle system partitioning problem

Automated Guided Vehicle System (AGVS) provides the flexibility and automation demanded by Flexible Manufacturing System (FMS). However, with the growing concern on responsible management of resource use, it is crucial to manage these vehicles in an efficient way in order reduces travel time and controls conflicts and congestions. This paper presents the development process of a new Memetic Alg...

متن کامل

A GUIDED TABU SEARCH FOR PROFILE OPTIMIZATION OF FINITE ELEMENT MODELS

In this paper a Guided Tabu Search (GTS) is utilized for optimal nodal ordering of finite element models (FEMs) leading to small profile for the stiffness matrices of the models. The search strategy is accelerated and a graph-theoretical approach is used as guidance. The method is evaluated by minimization of graph matrices pattern equivalent to stiffness matrices of finite element models. Comp...

متن کامل

Guided Policy Search

Direct policy search can effectively scale to high-dimensional systems, but complex policies with hundreds of parameters often present a challenge for such methods, requiring numerous samples and often falling into poor local optima. We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. We show how differential dynam...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1609.03076  شماره 

صفحات  -

تاریخ انتشار 2016